Explaining population booms and busts in Mid-Holocene Europe

Archaeological evidence suggests that the population dynamics of Mid-Holocene (Late Mesolithic to Initial Bronze Age, ca. 7000–3000 BCE) Europe are characterized by recurrent booms and busts of regional settlement and occupation density. These boom-bust patterns are documented in the temporal distribution of 14C dates and in archaeological settlement data from regional studies. We test two competing hypotheses attempting to explain these dynamics: climate forcing and social dynamics leading to inter-group conflict. Using the framework of spatially-explicit agent-based models, we translated these hypotheses into a suite of explicit computational models, derived quantitative predictions for population fluctuations, and compared these predictions to data. We demonstrate that climate variation during the European Mid-Holocene is unable to explain the quantitative features (average periodicities and amplitudes) of observed boom-bust dynamics. In contrast, scenarios with social dynamics encompassing density-dependent conflict produce population patterns with time scales and amplitudes similar to those observed in the data. These results suggest that social processes, including violent conflict, played a crucial role in the shaping of population dynamics of European Mid-Holocene societies.


Climate-based variation
For each cell, a time-dependent measure of relative agricultural productivity is calculated based an agriculture yield emulator model of Franke et al. [13] which we, in turn, drive using detailed (yearly) past climate data of Armstrong et al. [14] over a subset of the time period of their dataset (10,000 BP to 2,500 BP). The relative productivity for each cell is normalized by average productivity for the entire period and these temporal z-scores are applied as a multiplicative factor to arrive at yearly relative yield estimates.

Yield emulator
Franke et al. [13] provide fitted polynomial models to estimate agricultural yield potential as a function of climatic conditions. They built their models by running established agricultural productivity emulator models for a set of assumed present and Figure S1: Typical regional variations in mean temperatures (top row), rainfall (middle row) and estimated agricultural productivity (bottom row). Left panels display time series of regional mean temperature; the black lines show 30 year moving averages. Right panels show the autocorrelation of the same. We see that the patterns are dominated by short-term variations along with some gradual long-term trends; no periodic components are present. Figure S2: Areas covered by the climate data used in this study. Figure was generated with R (version 4.1.2, available at https://www.r-project.org/), using the ggplot2 package (version 3.3.5, available at https://ggplot2.tidyverse.org/). Base map data, including modern country borders was obtained using the rworldmap package (version 1.3-6, available at http://cran.r-project.org/web/packages/rworldmap) and is based on public domain data from the Natural Earth project, available at https://www.naturalearthdata.com/.
future climate variable combinations, and then fit a third-order polynomial to reproduce those detailed estimates. These polynomial models thus rapidly estimate the results from the otherwise prohibitably expensive original emulators. Polynomial models are provided for 9 different agricultural emulators (hence 9 parameter sets) with each emulator/parameter set targeting five crop types. In this work we used the polynomial model that was fitted for the LPJmL emulator [15] for spring and winter wheat. Our analysis could be easily extended to include models for additional emulators and crop types.
For each estimator and crop type, Franke et al. fitted a polynomial model independently in each one of thousands of 0.5 degree grid cells covering the land areas of the Earth. These (several million) fitted parameter values can be used to estimate agricultural yields world-wide given a wide possible range of climate parameters. These parameters are provided in a public repository for download [16].
Each polynomial model has four input variables: atmospheric carbon dioxide concentration (C), temperature (T), rainfall (W) and nitrogen application (N). Values for C, T and W are assumed to be averages during the growing season; within-year variation is assumed to follow the typical patterns observed in the 31-year baseline period between 1980 and 2010. Values for C are given in units of part per million (ppm), while the T and W input variables are interpreted as absolute and relative difference from the 31-year average data in the baseline period, according to the AgMERRA dataset [17,18]. Setting each input value to their default value (i.e. C = 360 ppm, T = 0, W = 1, and N = 200 kgha −1 ) allows the calculation of the agricultural productivity in a cell during the baseline period.

Climate data
Climate data for the baseline time period  is provided as part of the AgMERRA dataset in a rectangular grid, at a resolution of 0.25 degrees as daily values [17,18]. For the purpose of the current study, we calculated average values of temperature and rainfall over the whole baseline time period in a 0.5 degree resolution grid consistent with the yield dataset. A more precise estimate would consider the growing period for each crop type (as different magnitudes of change can occur at different times of the year).
Armstrong et al. [14] provide historical climate data for the northern hemisphere on a 0.5 degree resolution grid for the past 60,000 years. The data includes temperature and rainfall data at a temporal resolution of one month, allowing a potential extension for month-by-month comparison with the baseline climate data. In the current study, we calculate yearly averages for the study period between 8,050 BCE to 550 BCE, i.e. 2,500 to 10,000 BP. While the main focus of our simulations is the time period between 7,000 to 3,000 BCE, we generated estimates of agricultural productivity in a longer time period to be able to vary the simulation start date and to avoid edge effects at the end of the simulation when employing a sampling procedure to generate simulated SPDs.
The dataset covers most of the land area of Europe as shown in Fig. S2. In areas without climate data, we assumed a constant value of agricultural productivity, as estimated from the GAEZ yield dataset. These areas were included in the simulation, but they were excluded from the analysis of population time series so as not to add bias to the results.
We a reconstructed timeseries of atmospheric CO2 concentrations published by the EPA [19], performing a linear interpolation to get a yearly time series. Finally, we set the nitrogen application level to zero; additional insight about the impact of fertilizer use (e.g. manuring) could be incorporated in our model by varying this level over time.  Figure S3: Frequency distribution of the location of the first minimum in the ACF of regional time series of temperature, rainfall and estimated agricultural productivity.

Temporal patterns in climate and agricultural productivity
To better understand potential direct effect of temporal patterns in climate on population, we analyzed regional time series of climate variables (temperature and rainfall) and estimated agricultural productivity variations in a similar fashion to our simulated population data. Figure S1 shows examples of regional time series (using the same aggregation scheme as for the radiocarbon dates and simulation results in our study). We display aggregate distribution of ACF minimum locations in Fig. S3 for climate variables and estimated agricultural productivity. In accordance with the patters seen in Fig. S1, the large majority of ACF minimums is detected for very short times (i.e. below 100 years), indicating that there are no strong cyclic patterns present.

Spread of farming in Europe
As an additional validation of our model, we compared the simulation results to archaeologically established dates of settlement by farmers throughout Europe. We used an estimation of regional settlement patterns from a previous study [20], displayed in Fig. S4. This dataset contains a set of 33 regions, with 19 unique dates of settlement. For each region, we identified the set of cells in our simulation space contained in it, and identified the first time (in simulation years) when a given p share of the cells in each region was settled (with p = 0.1 for the model variant without conflict, and p = 0.02 for the variant including conflict to account for the lower total share of cells settled).
We show these times along with a fitted linear relationship in Fig. S5 for model variants both excluding and including violent conflict (in the latter case, we used model parameters p E = 0.1, p A = 0.1). We varied main parameters G (i.e. the characteristic distance of migrations), s (scale of climate-based variability of agriculture) and the simulation start date. We see that a good linear correspondence can be established, with the slope varying according to the G and s parameters. The approximately linear relationship shows that settlement patterns are reproduced reasonably well in the simulation, while a    Table S2: Fitted coefficients of a linear relationship between archaeologically estimated and simulated settlement dates of a set of regions in Europe (model variant with conflict, p E = 0.1 and p A = 0.1). A slope of 1 would mean a perfect correspondence; slopes below 1 indicate a faster, slopes above 1 indicate a slower spread of farmers in the simulation than expected. Notably, for higher values of s (presumed variation of agricultural productivity due to climate) the spread of farming slows down significantly. This can be compensated by using larger values of G. slope close to 1 indicates a correspondence in the actual speed of spread. In Tables S1 and S2, we show the fitted slope of the linear relationship between simulated and archaeologically established settlement dates. Depending on the model variant and the value of s, different G parameters give the closest approximation of the true spread of farming in Europe. In the main analysis, we have used a simulation start date of 7,000 BCE, s = 4, and G = 40 km for the model variant without conflict, and G = 80 km for the model variant including conflict.

Disaggregated results for 14C data
In Fig. S6, we show ACF and CV distributions for each of the grid tiling positions used in our study for the 14C dataset.

Regional results
In Figs. S8 and S9, we display typical regional population time series and their ACFs estimated from the 14C dataset in one tiling position. In Fig. S8, we display the computed SPD, the fitted logistic growth time series and the confidence interval obtained from generating 1,000 synthetic datasets based on the fitted time series. Fig. S9 shows the ACFs of the detrended SPD time series, i.e. of the time series obtained after subtracting the mean SPD of synthetic datasets from the SPD of the 14C dataset.
In Fig. S10-S13, we display regional population time series and ACFs for typical simulation results in one tiling position. Figs. S10 and S11 show population and ACF for the simulation variant without conflict (i.e. with p E = 0), but including climate variation; figs. S12 and S11 show population and ACF for the simulation variant that includes both conflict and climate variation components.

Variation of main parameters
In Figs. S14-S17, we show results (in terms of the distribution of ACF minima and CV values) in the three main simulation variants while varying the values of the most important parameters (additional parameter variations are presented in the next section). All of these results are based on 100 repeated realizations of the simulation with the given parameter values and compiling aggregate histograms of the results among these.
In Fig. S14, we show these results for the simulation variant without conflict (i.e. p E = 0), but including climate variations. In this case, we test variations in the s and G parameters.
In Figs. S15 and S16, we show ACF minima and CV distributions for the main simulation variant that includes both conflict and climate variation. In Fig. S15, we used s = 2 and G = 80 km and varied the p E and p A parameters. We see that the mode mode and shape of distributions changes gradually with the parameters. In   Figure S8: Regional SPDs computed from radiocarbon dates. Blue lines indicate the fitted logistic trend for SPDs, showing typical S-shaped patterns that describe well the overall trend in most cases. Red areas show the 95% confidence intervals resulting from 1,000 synthetic SPDs. The p-values are estimated as the share of synthetic SPDs with a higher total z-score outside of the 95% interval than the real data [21,22]. We see boom and bust patterns in most regions, with typical timescales of 500-1,000 years.  Figure S9: ACFs computed from regional SPDs of radiocarbon dates after detrending. Patterns correspond well to typical timscales of 500-1,000 years for demographic processes.    years, and varied the s and G parameters. In this case, the mode of the distributions does not change significantly, while the shape is affected to some degree: for lower values of s, there is less variance in ACF minima. In Fig. S17, we show ACF minima and CV distributions for the model variant where density-dependent conflicts happen (i.e. p E > 0), but do not lead to the creation of aggressors (p C = 0; there is no second-order dynamics). We used a parameter of G = 80 km and varied the p E and s parameters.  Figure S17: Distribution of the location of the first minimum in regional ACFs (top) and coefficients of variation (bottom), model with conflict and climate, but no aggressors (i.e. first-order interactions), G = 80 km.

Additional model variants
Beyond the main model variant, we have investigated the effect of three further choices on model behavior: • Alternate fission probabilities: instead of a linear dependence on population (Eq. (2) in the main text), we employed the logistic model of group fission from Ref. [23]. Note that even in this case, we scale the maximum fission probability to 0.125, i.e. a group fission is expected to occur once every 8 years. We denote the original variant by (A) and the alternate case by (B).
• Initial group migration for aggressors: contrary to the original model where selecting an already occupied cell as a migration target results in a conflict without migration, we have considered an alternate case when a migration still happens, and newly created aggressors take over the target cell. After this, they still remain in-place (contrary to the alternate, "roaming aggressors" model variant). We denote the original variant by (C) and the alternate case by (D).
• Aggressors do not consider the population of potential target cells: contrary to the original model where aggressors prefer to attack cells with larger population, in this alternate variant, target cells' population plays no role in their selection. Note that the probability of winning a conflict is unaffected. We denote the original variant by (E) and the alternate case by (F).
The combination of these gives in total 8 possible model variants, denoted by the combination of letters defined above, e.g. ACE denotes the original model variant, while BCF denotes a version where fission probabilities use the logistic form and aggressors are not affected by target cells' population in their choices. We display main results for these model variants for one combination of parameters (G = 80 km, s = 2, p A = 0.1 and p E = 0.2) in Fig. S18. Each panel shows results from 100 repeated simulation realizations. Based on these results, we conclude that differences among these model variants are insignificant.

Impact of varying additional parameters on ACF distributions
In Figs. S19-S41, we display additional results for the distribution of ACF minima where we additionally varied the s and G parameters. All of these results are based on 100 repeated realizations of the simulation with the given parameter values and compiling aggregate histograms of the results among these. Figs  Relative frequency Figure S41: Distribution of the location of the first minimum in regional ACFs, model with conflict but no aggressors (i.e. firstorder interactions), G = 70 km.